Similarity Join in Metric Spaces Using eD-Index

نویسندگان

  • Vlastislav Dohnal
  • Claudio Gennaro
  • Pavel Zezula
چکیده

Similarity join in distance spaces constrained by the metric postulates is the necessary complement of more famous similarity range and the nearest neighbor search primitives. However, the quadratic computational complexity of similarity joins prevents from applications on large data collections. We present the eD-Index, an extension of D-index, and we study an application of the eDIndex to implement two algorithms for similarity self joins, i.e. the range query join and the overloading join. Though also these approaches are not able to eliminate the intrinsic quadratic complexity of similarity joins, significant performance improvements are confirmed by experiments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Access Structures for Advanced Similarity Search in Metric Spaces

Similarity retrieval is an important paradigm for searching in environments where exact match has little meaning. Moreover, in order to enlarge the set of data types for which the similarity search can efficiently be performed, the notion of mathematical metric space provides a useful abstraction for similarity. In this paper we consider the problem of organizing and searching large data-sets f...

متن کامل

A content-addressable network for similarity join in metric spaces

Similarity join is an interesting complement of the wellestablished similarity range and nearest neighbors search primitives in metric spaces. However, the quadratic computational complexity of similarity join prevents from applications on large data collections. We present MCAN, an extension of MCAN (a Content-Addressable Network for metric objects) to support similarity self join queries. The...

متن کامل

Database Similarity Join for Metric Spaces

Similarity Joins are recognized among the most useful data processing and analysis operations. They retrieve all data pairs whose distances are smaller than a predefined threshold ε. While several standalone implementations have been proposed, very little work has addressed the implementation of Similarity Join as a physical database operator. In this paper, we focus on the study, design and im...

متن کامل

Solving similarity joins and range queries in metric spaces with the list of twin clusters

The metric space model abstracts many proximity or similarity problems, where the most frequently considered primitives are range and k-nearest neighbor search, leaving out the similarity join, an extremely important primitive. In fact, despite the great attention that this primitive has received in traditional and even multidimensional databases, little has been done for general metric databas...

متن کامل

GPU Accelerated Self-join for the Distance Similarity Metric

The self-join finds all objects in a dataset within a threshold of each other defined by a similarity metric. As such, the self-join is a building block for the field of databases and data mining, and is employed in Big Data applications. In this paper, we advance a GPU-efficient algorithm for the similarity self-join that uses the Euclidean distance metric. The search-and-refine strategy is an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003